Skip to content

Conversation

@pawosm-arm
Copy link
Contributor

@pawosm-arm pawosm-arm commented Apr 10, 2025

This is to backport the fix for #126836 to the release/20.x branch. Tested on the real thing.

lukel97 and others added 18 commits April 10, 2025 18:28
…lvm#130048)

From what I understand, we only create VPReductionRecipes for in-loop
reductions, and we don't currently support in-loop AnyOf reductions.

We only create VPReductionRecipes in the !PhiR->isInLoop() section of
adjustRecipesForReductions, and this comment from the initial patch
seems to confirm this
https://reviews.llvm.org/D108136#anchor-inline-1038338, so I think we
can remove this check in the condition logic.

I checked compiling SPEC 2017 with -prefer-inloop-predicates and the
added assertion doesn't trigger.
)

Currently fast() won't return true if all flags are set via setXXX,
which is surprising. Update setters to set all bits if needed to make
sure isFast() consistently returns the expected result.

PR: llvm#131321
This is split off from llvm#131300.

A VPReductionRecipe will never have a AnyOf or FindLastIV recurrence, so
when it calls createReduction it always calls createSimpleReduction.

If we replace the call then it leaves createReduction with one user in
VPInstruction::ComputeReductionResult, which we can inline and then
remove.
This patch change the parent of the VPReductionRecipe from
VPSingleDefRecipe to VPRecipeWithIRFlags and also print/get/drop/control
flags by the VPRecipeWithIRFlags. This will remove the dependency of the
underlying instruction.

This patch also add a new function `setFastMathFlags()` to the
VPRecipeWithIRFlags because the entire reduction chain may contains
multiple instructions. And the underlying instruction may not contains
the corresponding flags for this reduction.

Split from llvm#113903.
…pUtils. NFC (llvm#132014)

Split off from llvm#131300, this splits up RecurrenceDescriptor arguments so
that arbitrary recurrence kinds may be used down the line.
createInductionAdditionalBypassValues is only used for epilogue
vectorization now. Move it out of ILV, which means we do not have to
thread through ExpandedSCEVs and also don't have to track the bypass
values in ILV. Instead, directly create them if needed after executing
the epilogue plan. This moves more the epilogue specific logic out of
the generic executePlan.
Instead of executing the whole entry VPIRBB twice, first only execute
the VPExpandSCEVRecipes and replace their uses with the expanded
VPValue, which will be a live-in. This allows removing special logic in
VPExpandSCEVRecipe to support executing twice and allows moving the
ExpandedSCEVs map out of VPTransformState.

It will also allow adding other recipes to the entry VPBB in the future.
Update code to use VPBuilder, to simplify follow-up changes.
…CI (llvm#131300)

VPReductionRecipes take a RecurrenceDescriptor, but only use the
RecurKind and FastMathFlags in it when executing. This patch makes the
recipe more lightweight by stripping it to only take the latter two.

The motiviation for this is to simplify an upcoming patch to support
in-loop AnyOf reductions. For an in-loop AnyOf reduction we want to
create an Or reduction, and by using RecurKind we can create an
arbitrary reduction without needing a full RecurrenceDescriptor.
Splits off reduction printing tests, to limit growth and add test case
for printing find-last-IV (llvm#132689)
This moves the logic for computing the FindLastIV reduction result to
its own opcode. A follow-up patch will update the new opcode to also
take the start value, to fix
llvm#126836.

PR: llvm#132689
llvm#132690)

Keep the start value as operand of ComputeFindLastIVResult. A follow-up
patch will use this to make sure the start value is frozen if needed.

Depends on llvm#132689

PR: llvm#132690
FindLastIV introduces multiple uses of the start value, where in the
original source there was only a single use, when the epilogue is
vectorized.

Each use of undef may produce a different result, so introducing
multiple uses can produce incorrect results when the input is
undef/poison.

If the start value may be undef or poison, freeze it and use the frozen
value, which will be the same at all uses.

See the following scenarios in Alive2:
* Both main and epilogue vector loops execute, go to exit block: https://alive2.llvm.org/ce/z/_TSvRr
* Both main and epilogue vector loops execute, go to scalar loop: https://alive2.llvm.org/ce/z/CsPj5v
* Only epilogue vector loop executes, go to exit block: https://alive2.llvm.org/ce/z/5XqkNV
* Only epilogue vector loop executes, go to scalar loop: https://alive2.llvm.org/ce/z/JUpqRN

The latter 2 show requiring freezing the resume phi. That means we cannot freeze
in the preheader. We could move the freeze to the main iteration count check, but
that would be a bit fragile to find and other transforms can sink the freeze if needed.

Depends on llvm#132689
and llvm#132690.

Fixes llvm#126836

PR: llvm#132691
Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaks LoopUtils.h ABI in obvious ways and FMF.h ABI in less obvious ways.

Can this be fixed in a more minimal way than backporting 18 commits that include a lot of refactorings?

@tstellar tstellar moved this from Needs Triage to Needs Fix in LLVM Release Status Apr 14, 2025
@fhahn
Copy link
Contributor

fhahn commented Apr 14, 2025

I'm not sure if it is feasible to strip the fix down, as it depends quite a few refactoring patches.

For 20.x, it might be best just not enable epilogue vectorization for FindLastIV: #135666

@pawosm-arm
Copy link
Contributor Author

Thanks @fhahn for your comment and your patch. It is a good reason for closing down this one.

@pawosm-arm pawosm-arm closed this Apr 14, 2025
@nikic nikic moved this from Needs Fix to Done in LLVM Release Status Apr 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

5 participants